Transposition Invariant String Mat hing ⋆

نویسندگان

  • Veli Mäkinen
  • Gonzalo Navarro
  • Esko Ukkonen
چکیده

Veli Mäkinen a,1, Gonzalo Navarro b,2, and Esko Ukkonen a,1 aDepartment of Computer S ien e, P.O Box 26 (Teollisuuskatu 23), FIN-00014 University of Helsinki, Finland. bCenter for Web Resear h, Department of Computer S ien e, University of Chile Blan o En alada 2120, Santiago, Chile. Abstra t Given strings A = a1a2 . . . am and B = b1b2 . . . bn over an alphabet Σ ⊆ U, where U is some numeri al universe losed under addition and subtra tion, and a distan e fun tion d(A,B) that gives the s ore of the best (partial) mat hing of A and B, the transposition invariant distan e is mint∈U{d(A + t, B)}, where A+ t = (a1+ t)(a2+ t) . . . (am+ t). We study the problem of omputing the transposition invariant distan e for various distan e (and similarity) fun tions d, in luding Hamming distan e, longest ommon subsequen e (LCS), Levenshtein distan e, and their versions where the exa t mat hing ondition is repla ed by an approximate one. For all these problems we give algorithms whose time omplexities are lose to the known upper bounds without transposition invarian e, and for some we a hieve these upper bounds. In parti ular, we show how sparse dynami programming an be used to solve transposition invariant problems, and its onne tion with multidimensional range-minimum sear h. As a byprodu t, we give improved sparse dynami programming algorithms to ompute LCS and Levenshtein distan e.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Mat hing Numeri Strings under Noise

Abstra t. Numeri string is a sequen e of symbols from an alphabet U, where U is some numeri al universe losed under addition and subtra tion. Given two numeri strings A = a 1 a m and B = b 1 b n and a distan e fun tion d(A;B) that gives the s ore of the best (partial) mat hing of A and B, the transposition invariant distan e is min t2U fd(A + t; B)g, where A + t = (a 1 + t)(a 2 + t) : : : (a m ...

متن کامل

Matching Numeric Strings under Noise

Abstra t. Numeri string is a sequen e of symbols from an alphabet U, where U is some numeri al universe losed under addition and subtra tion. Given two numeri strings A = a 1 a m and B = b 1 b n and a distan e fun tion d(A;B) that gives the s ore of the best (partial) mat hing of A and B, the transposition invariant distan e is min t2U fd(A + t; B)g, where A + t = (a 1 + t)(a 2 + t) : : : (a m ...

متن کامل

A Note on Randomized Algorithm for String Mat hing with Mismat hes

Abstra t. Atallah et al. [ACD01℄ introdu ed a randomized algorithm for string mat hing with mismat hes, whi h utilized fast Fourier transformation (FFT) to ompute onvolution. It estimates the s ore ve tor of mat hes between text string and a pattern string, i.e. the ve tor obtained when the pattern is slid along the text, and the number of mat hes is ounted for ea h position. In this paper, we ...

متن کامل

Faster than Fft : Rotation

In this arti le we onsider the rotation invariant template mat hing problem from the ombinatorial point of view. The problem is to nd the pla es and orientations in an image where a pattern an be superimposed so that it is similar enough to the image. The traditional approa h to this problem uses the Fast Fourier Transformation. We present a ombinatorial approa h that is inspired by string mat ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008